L a Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values 1

نویسندگان

  • Makoto Iwayama
  • Tokunaga
چکیده

Text categorization is the classication of documents with respect to a set of predened categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is less aected by having insucient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the \Wall Street Journal".

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Probabilistic Model for Text Categorization: Based on a Single Random Variable with Multiple Values

Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a Single random Variable with Multiple Values (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers ...

متن کامل

Support vector regression with random output variable and probabilistic constraints

Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Improving the Operation of Text Categorization Systems with Selecting Proper Features Based on PSO-LA

With the explosive growth in amount of information, it is highly required to utilize tools and methods in order to search, filter and manage resources. One of the major problems in text classification relates to the high dimensional feature spaces. Therefore, the main goal of text classification is to reduce the dimensionality of features space. There are many feature selection methods. However...

متن کامل

A discrete particle swarm optimization algorithm with local search for a production-based two-echelon single-vendor multiple-buyer supply chain

This paper formulates a two-echelon single-producer multi-buyer supply chain model, while a single product is produced and transported to the buyers by the producer. The producer and the buyers apply vendor-managed inventory mode of operation. It is assumed that the producer applies economic production quantity policy, which implies a constant production rate at the producer. The operational pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994